Veera
Raghavendra Chikka, International Institute of Information Technology
Hyderabad, raghavendra.ch@research.iiit.ac.in PRIMARY
Kamalakar Karlapalem(Advisor), International Institute of Information
Technology Hyderabad, kamal@iiit.ac.in
Student Team: YES
Sematic
Parsing software of LUND University : Semantic parsing software using
PropBank and NomBank frames
SIMILE project of MIT : Interactive
tool for displaying timeline and description of events.
D3.js JavaScript library for web visualizations.
R-programming language : R is a
language and environment for statistical computing and graphics.
Approximately how many hours were spent working on this submission
in total?
180
hours
May we post your submission in the Visual Analytics Benchmark
Repository after VAST Challenge 2014 is complete? YES
Video:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Questions
MC1.1 – Provide a
visual representation of the structure of the Protectors of Kronos network,
with supporting evidence.
a.
Who are the leaders?
b.
Who is part of the extended network?
c.
How has the group structure and organization
changed over time?
d.
Where are the potential connections between the
minor-latin;mso-fareast-font-family:"Times New Roman";mso-hansi-theme-font: POK and GAStech?
Provide novel visualizations appropriate for communicating
key information to the busy leaders of the investigation. Please limit your
response to no more than eight images and 500 words.
After a thorough
literature survey, we have noticed different analysis strategies to attack an
investigation analysis problem. The strategy we followed here for this VAST
challenge is referred as "Find a clue, Follow the trail".
Our work started with
skimming through the dataset (mainly Historical Documents and news reports).
Historical documents has very crucial information about the initial roots of
the structure of POK. Then comes the mining of information from news reports,
thanks to stanford-ner tool for making our task easier. It opens each file by
tagging PERSON, LOCATION and ORGANISATION attributes. Figure 1.1 shows an
modified version of stanford-ner-gui tool with a tagged file and right most
frame contains all PERSON entities of given news reports.
Then, we used
R-programming language to analyse information from NER tagged news reports
using different kinds of visualizations like adjacency matrix, word cloud,
parallel coordinates to find relation-ships among different entities.
Fig 1.1: Modified Stanford NER GUI
a. Usually, members in a government or a
company have their own designations which are used to infer a person. Here in
the case of Protestor of Kronos(POK) which is a private organisation have only
two levels of designation, namely leader and member. So we came up with few
rule-based patterns for identitfying leaders of POK. As such we have found 3
leaders of POK in different timelines - Henk Bodrogi, Elian Karel and Silvia
Marek
b. We combined the information extracted from
historical documents, new reports and EmployeeRecords to find the extended
network of POK which includes Michale Kraft, Isia Vann and Hennie Osvaldo. The
whole structure of organisation under Silvia Marek is shown in figure 1.3
c. From above analysis, we have POK leaders,
their network and timeline. We provide an composite and interactive
visualization which comprises of POK Leaders, their timeline and structure of
POK as shown below.
Fig
1.2: POK Leaders, their timeline and structure of POK
Initial Stage: Formation of the
Grassroots Effort
When it became clear the Council could not agree on a course of action the
Elodis citizens decided to address the problem independently. At this point the
SMO was in the initial stages of development, being still a group of citizens
with similar concerns. The primary actors in the initial stage formed the seven
founding members of the Protectors of Kronos SMO: Henk Bodrogi, Carmine
Osvaldo, Ale L. Hanne, Jeroen Karel, Valentine Mies, Yanick Cato and Joreto
Katell.
"Profile
of Dominant POK Personalities" section.
d. We found few hints in ElectronicRecords of
GAStech employees who had been disposed for misconduct. Mischief GAStech
Employees possessing the last name same as that of the POK members are
considered to be potential connections between GAStech and POK. They are marked
with red colour as shown in Fig 1.3
GAStech
employees having last name matching with POK members belonging to
"Security" and "General Discharge". That is,
"Kronos" AND "Security" AND "General Discharge"
AND "Last Name"
Osvaldo,Hennie,31/05/1988,Kronos,Male,Kronos,BirthNation,31/05/1988,,,,Security,Perimeter
Control,07/06/2011,Hennie.Osvaldo@gastech.com.kronos,ArmedForcesOfKronos,GeneralDischarge,01/10/2010
Vann,Isia,13/12/1986,Kronos,Male,Kronos,BirthNation,13/12/1986,,,,Security,Perimeter
Control,14/12/2007,Isia.Vann@gastech.com.kronos,ArmedForcesOfKronos,GeneralDischarge,01/10/2007
Fig
1.3: POK structure under leadership of Silvia Marek. The level of red mark
indicates the potential connection of member with GAStech organisation.
MC1.2
– Describe the events of January 20-21, 2014. What is the timeline
of events? Please limit your response to no more than ten images and 500 words.
We have collected all
the news reports (about 265 entries) published on january 20 and january 21
into a separate workspace. Few news report doesnot have published time
information. So by making a basic assumption that two news reports having
exactly same information must be published at a time, we found published time
of unknown articles. We used TF-IDF weighing Vector space model to find the
similar articles to obtain time for the reports which does not have published
time information.
We define an event as
a significant thing that happened involving an agent(can be person or any thing
having some role) at a determinable time.
Event detection has
been an important task for a long time. After studing many research works we
came up with a new approach combining IR technique(N-Gram bursty words) and NLP
approach(Semantic Role labeling). Using this approach we gave scores to
individual sentences of each news article and took top scored sentences(say
202) of our dataset which we consider as events. We then filtered the
duplicate events and ended up with 58 events. Finally, We further manually
filtered events that are less relevant to GAStech disapperance and finalized 32
events. We used SIMILE project interactive tool for displaying timeline and
description of events.
Fig
2.1: Events Timeline representing that the GAStech meeting is held from Jan 20,
2014 8:00am to 10:00am.
160.txt,
618.txt, 711.txt, 764.txt
597.txt,
348.txt
453.txt,
673.txt, 326.txt
167.txt,692.txt,
10.txt,
537.txt
453.txt,
763.txt
326.txt,
70.txt, 563.txt
283.txt,
710.txt, 458.txt, 557.txt
78.txt,
299.txt, 215.txt, 540.txt, 292.txt,
697.txt
355.txt,
806.txt
625.txt,
490.txt
522.txt,
828.txt
805.txt,
811.txt
633.txt,
718.txt, 721.txt
660.txt,
253.txt
395.txt,
368.txt
417.txt,
322.txt
87.txt,
94.txt
429.txt,
633.txt, 718.txt, 721.txt
592.txt
567.txt,
817.txt
313.txt,
485.txt
172.txt,
386.txt
637.txt
118.txt,
744.txt
30.txt,
418.txt
276.txt,
693.txt, 676.txt,
556.txt,
624.txt
110.txt,
793.txt
219.txt,
822.txt, 261.txt, 310.txt, 824.txt, 178.txt
Fig
2.2: Displaying the description of a event.
Fig 2.3: Displaying all events in a single image
using R
MC1.3
– Identify at least two possible explanations why the GAStech employees may be missing. What evidence do you have
to support each of these explanations? Please limit your response to no more
than three additional images and 200 words.
For the possibilities
of GAStech employees disappearance, we concentrated only on the speculations
that are being made after disappearance. Using bag of words which represent
speculations we collected about 20 news articles from the dataset. We got 6
clusters when we clustered those 20 news reports representing 6 unique
speculations that are being made as shown in Figure 3.1. We then examined each
cluster on the basis of the above events(Answer 1.2) and concluded the two
possible explanations of disappearance.
Fig
3.1: Clusters based on the speculations made on GAStech disappearance
If you closely examine
the helicopter episode(articles 633.txt, 718.txt, 721.txt), we can notice that
first plan fled from the GAStech building (articles 453.txt and 763.txt) whose
passengers appeared to be in hurry and second plane from an unknown location
and those passengers were very relaxed than first group of passengers. But in
article(592.txt) "Airport officials have confirmed that they were from a
private company, and that this company was not GAStech". So GAStech
disapperance have nothing to do with second plane. By using above provided
approach we hypothetically provide explanations around first plane.
POK kidnapping GAStech
employees as its clearly mentioned that POK demanded for ransom
GAStech executives
have fled away from the country with their fortune and money
Fig
3.2: Interactive tool having Specultion Clusters with their evidence news
reports.